Estimating disease epidemiology at a smaller scale

Christopher Jackson
MRC Biostatistics Unit, University of Cambridge

Background

National agencies usually publish disease data (e.g. mortality, incidence, prevalence) for large population subgroups, commonly including

  • age group and sex

  • area

Health impact models want these for smaller subgroups, e.g. 

  • smaller areas

  • socioeconomic indicators (deprivation indices, education etc…)

  • combinations of all of the above

In a microsimulation model, synthetic individuals are labelled with several characteristics - how finely can we describe their disease risk?

Different sources of data

National agencies may publish cross-tabulations of disease outcomes:

  • by age/sex, by small area, by socioeconomic indicators separately

  • but not by all of these factors jointly

May also be cohort studies or other literature giving estimates of disease outcomes by particular predictors (e.g. education).

Make the most of all sources of data

We cannot know for sure things that are not observed, but we may estimate them under clear assumptions

For example: an assumption that some risk factors act independently on the risk of a disease outcome.

Example: mortality in Melbourne

  • Mortality rates by year of age and sex, for whole state.

  • Standardised mortality rates \(r^{std}_i\) by small areas \(i\)

    • Expected number of deaths in the area if the age/sex balance of the area were the same as the standard population

    • Convert to excess rate \(r^{std}_i / r^{std}_{ave}\) relative to average

    • Average of area-specific rates weighted by area population \(n_i\): \(r^{std}_{ave} = \sum_i n_i r^{std}_i / \sum_i n_i\)

Disaggregating disease outcomes: using standardised rates

Estimate small-area-specific rate for particular age/sex as

  • large-area rate by age/sex, multiplied by

  • excess age/sex standardised rate for small area

General principle: estimate rate by risk factors A x B as

  • rate by risk factor A, multiplied by

  • excess rate (standardised relative to A) for risk factor B

Assumption Risk factors act independently

Results

Mortality by age and sex
Mortality by area

Mortality disaggregated by year of age, sex and area

Example: mortality in Melbourne, by education

We now have (estimates of) mortality by year of age, sex and small area.

Now we want to disaggregate this further to account for socieconomic variations within the area

Use data on

  • relative risk of mortality for (high / low education) (for broad age groups), assuming this effect is the same for all small areas.

  • the proportion of people in each area with different levels of education

Hence infer mortality in each area by year of age, sex, and education

Disaggregating disease outcomes

Take the mortality data for a specific group (e.g. year of age, sex, small area)

Died Survived
6% 94%
Low education High education
60% 40%

Problem is to fill in the 2 x 2 table

  Died Survived Total
Low education 60%
High education 40%
Total 6% 94% 100%

Disaggregating disease outcomes

Given knowledge of one cell, we can deduce the other cells

  Died Survived Total
Low education x 60%-x 60%
High education 6%-x 34%+x 40%
Total 6% 94% 100%

If we know the relative risk of death between the two education groups, we can deduce x, and fill in the whole table.

Disaggregating disease outcomes

Algebraic explanation: we know, for some population:

  • \(r_{ave}\): average mortality rate

  • \(p_0, p_1 = 1 - p_0\):, proportions with/without the risk factor

  • \(RR\): relative mortality with/without the risk factor

and we want

  • \(r_0,r_1\): mortality in this group with / without the risk factor

\[ \begin{split} r_{ave} & = p_0 r_0 + p_1 r_1 \\ & = p_0 r_0 + p_1 r_0 RR + \\ r_0 & = r_{ave} / (p_1 RR + p_0) \end{split} \]

Generalisation to risk factors with more than two categories

e.g. socioeconomic index, with levels \(i = 1, 2, ...\). We know

  • \(p_i\): proportion of population in level \(i\)

  • \(RR_i\): relative risk of mortality for level \(i\), compared to level 1

and want to obtain \(r_i\), absolute mortality in level \(i\)

\[ r_{ave} = p_1 r_1 + p_2 r_2 + ... = r_1 \sum_{i=1}^G p_i RR_i \]

gets us \(r_1\) in terms of known quantities.

Then compute \(r_i = r_1 RR_i\)

Summary

  • General principle for disaggregating tabular data on some outcome (e.g. disease mortality, incidence)

  • Estimate joint effects of multiple risk factors, given effect of each separately

  • Requires relative rates/risks + population sizes for each risk factor, or standardised rates.

  • Assumes that different risk factors act independently